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Abstract: We propose a Bayesian approach to detect multiple change-points in a 
piecewise-constant signal corrupted by a functional part corresponding to environ¬ 
mental or experimental disturbances. The piecewise constant part (also called seg¬ 
mentation part) is expressed as the product of a lower triangular matrix by a sparse 
vector. The functional part is a linear combination of functions from a large dic¬ 
tionary. A Stochastic Search Variable Selection approach is used to obtain sparse 
estimations of the segmentation parameters (the change-points and the means over 
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the segments) and of the functional part. The performance of our proposed method 
is assessed using simulation experiments. Applications to two real datasets from 
geodesy and economy holds are also presented. 


Key words: Segmentation, Corrupted series, Dictionary approach, Stochastic search 
variable selection 


1 Introduction 

The problem of detecting multiple change-points in signals arises in many fields such 
as biology (Boys and Henderson, 2004), geodesy (Williams, 2003; Bertin et ah, 2016), 
meteorology (Caussinus and Mestre, 2004; Fearnhead, 2006; Wyse et ah, 2011; Rug- 
gieri, 2013) or astronomy (Dobigeon et ah, 2007) among others. In addition to change- 
points, we may observe environmental or experimental disturbances (for instance geo¬ 
physical signals or climatic effects) which need to be taken into account in the models. 
Since the form of these disturbances is in general unknown or partially unknown, it 
seems natural to model them as a functional part that has to be estimated. Our goal 
in this article is to develop a Bayesian approach that allows us to both estimate the 
segmentation part (the change-points and the means over the segments) and the func¬ 
tional part. The Bayesian approach has the advantage that expert knowledge can be 
introduced in the models through prior distributions. This can be useful in multiple 
change-points problems where change-points can be related to specific events such 
as instrumental changes, earthquakes, very hot years or months or economic crisis 
for example. Moreover, posterior distributions allow us for a quantification of the 
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uncertainty, giving in particular posterior probabilities or credible intervals for the 
positions of change-points or the functional part. This is of particular interest for 
practitioners. 

Several methods have been proposed in a Bayesian framework for the multiple change- 
points problem. These methods are based, mostly, on reversible jump Markov Chain 
Monte Carlo algorithms (Lavielle and Lebarbier, 2001; Boys and Henderson, 2004; 
Tai and Xing, 2010), Stochastic search Variable Selection (Dobigeon et ah, 2007), 
dynamic programming recursions (Ruggieri, 2013) or non-parametric Bayesian ap¬ 
proaches (Martinez, and Mena, 2014 and references there in). All these Bayesian 
methods deal with the multiple change-points problem but they do not consider the 
presence of functional disturbances. However, as illustrated in Picard et al. (2011) 
and Bertin et ah (2016) in simulation and real examples, taking into account the 
functional part in the segmentation model can be crucial for an accurate change- 
point detection and interesting information can be extracted from the form of the 
functional part. 

We propose a novel Bayesian method to detect multiple change-points in a piecewise- 
constant signal corrupted by a functional part, where the functional part is esti¬ 
mated using a dictionary approach (Bickel et ah, 2009) and the segmentation part 
is treated as a sparse problem. More precisely, concerning the segmentation, we fol¬ 
low Harchaoui and Levy-heduc (2010) by expressing the piecewise constant part of 
the model as a product of a lower triangular matrix by a sparse vector (which non¬ 
zero coordinates correspond to change-points positions). In addition, the functional 
part is represented as a linear combination of functions from a dictionary. Since a 
large variety of functions can be included in the dictionary, this leads generally to a 
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sparse representation of the functional part in terms of functions from the dictionary. 
Hence, a Stochastic search Variable Selection approach can be used to estimate the 
sparse vectors, that is, both the location of the change-points and the functional part 
(Georges and McCulloch, 1997). 

In the simulation studies and for real datasets, we obtain good results for both the 
segmentation and the functional parts. On a GPS series from an Australian station 
and on a series of daily records of Mexican Peso/US Dollar exchange rate, expected 
change-points are recovered. Moreover the benefits of the Bayesian approach are also 
illustrated in the simulation and real data studies. In particular, for the GPS series, 
the use of prior knowledge about instrumental changes enables us to detect some 
relevant change-points and, in the functional part, periodic components suggested in 
previous works. 

The remainder of the paper is organized as follows. Section 2 presents the hierarchical 
Bayesian model considered, Section 3 outlines the procedure used to estimate the 
model parameters. In Section 4, the performance of the proposed method is studied 
through simulations. The procedure is illustrated on the two real datasets in Section 
5, and finally Section 6 discusses it. 
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2 Model 

2.1 Segmentation model with functional part 

We observe a series Y = (Y \,..., Y n )' that satisfies 

Y t = Hk + f(x t ) + e t , Vtel k = (T k -i,T k ],ke{i,...,K}, (2.1) 

where K is the total number of segments of the series is unknown, the e t are i.i.d 
centered Gaussian variables with variance a 2 , x t is a covariate (the simple one is the 
time t), f is an unknown function to be estimated, Tk is the fcth change-point, pk is 
the mean of the series on the segment Ik.. We use the convention To = 0 and tk = n. 

A classical approach in non-parametric framework is to expand the functional part / 
with respect to orthonormal basis, such as Fourier or wavelet ones (see Hardle et ah, 
1998 and references therein). Following Bickel et al. (2009) or Bertin et ah (2016), we 
choose here to adopt a dictionary approach, that consists in finding an over-complete 
representation of /. More precisely, we expand / with respect to a large family of 
functions (0j)j=i,...,M, named dictionary, that can for example be the union of two 
orthonormal basis. Then / is assumed to be of the form 

M 

f 0) 

3 = 1 

where A = (Ai,..., A m)' £ R AI is a vector of coordinates of / in the dictionary and 

(/(*!),...,/(*»))' = FA, 

where F is the n x M matrix F = ((f>j(xi))ij. Note that since large dictionaries are 
considered, this allows us to obtain a sparse representation of the function /, that is 
the vector A is expected to be with few non-zero coordinates. 
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To estimate the change-points in the series, we follow the strategy proposed by Har- 
chaoui and Levy-Leduc (2010), which consists in refraining this task in a variable 
selection context. We denote by X the n x n lower triangular matrix having only l’s 
on the diagonal and below it. We consider the n x 1 vector /3 with only K non-zero 
coefficients at positions (77 + l)fc=o,...,A-i with (3 Tk+ 1 = Hk+i — Hk and using the con¬ 
vention /i 0 = 0. Note that the segmentation (the change-points 7 and the means /ifc) 
will be recovered by the vector (3. 

The model (2.1) can then be rewritten as follows 

Y = X/3 + FA + e, 

where e = (ey,... ,£ n )'. Our objective is now to estimate the parameters /3, A and 
cr 2 . Since both f3 and A vectors are expected to be sparse, we propose to use Bayesian 
methods of variable selection for their estimation. 

2.2 Bayesian hiercharchical framework 

Following George and McCulloch (1993), we first introduce latent variables 7 and r 
to identify non-null components of the vectors f3 and A. The vector 7 = ( 71 ,..., y n ) 
is such that 7 * = i/ 3 ^ 0 }) where / denotes the indicator function and the vector 
r = ( 77 ,... ,r M ) satisfies ?7 = /{a^o}- The number of non-zero coordinates of 7 and 
r are d 7 = K and d r respectively. The product X/3 is equal to X 7 /3 7 where X 7 is 
the nxd 7 matrix containing only the j columns of A" such that 7 j is non-zero and 
/3 7 is a d 7 x 1 vector containing only the non-zero coefficients of /3. Similarly, we can 
express FA as F r A r where F r is an x d r matrix and A r a d r x 1 vector. The model 
( 2 . 1 ) can be then rewritten as 
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Y — X 7 / 3 7 + F r A r + s 


where the parameters to estimate are 6 = {/3 , 7 , A r , r, a 2 }. 

Then, as usual in a Bayesian context, these parameters are treated as random vari¬ 
ables, assumed here to be independent, and we consider the following prior distri¬ 
butions. The 7 i are independent Bernoulli variables with parameter 0 < 77 < 1 for 
i = 2,... , n and with 77 = 1 by convention. The ?7 are also independent Bernoulli 
variables with parameter 0 < rjj < 1 for j = 1,... , M. Then the noise parameter 
follows a Jeffrey distribution, 7r(cr 2 ) oc <t~ 2 . The conditional distribution of /3 7 |7,a 2 
is the classical g-prior of Zellner (1986) given by /3 7 |7, cr 2 ~ J\f ( p ^0, Ci<7 2 (X 7 X 7 ) 1 j. 
Finally the conditional distribution of A r |r,a 2 is also a g-prior, with A r |r,cr 2 ~ 


A f dr (0,c 2 a 2 ( F'rFr)- 1 ). 


The posterior distribution of 6 has the following expression 


7t(Y|0)7t(/ 3 7 |7, <7 2 )7 r, cr 2 )7r(7)7r(r)7r(o- 2 ) 



( 2 . 2 ) 


where 


7t(Y J 0) 



2 


exp 



3 MCMC Schemes 


A classical approach for the computational scheme would be to estimate the whole 
parameters at the same time (/ 3 7 , 7, A r , r, o 2 ) using a Metropolis-within-Gibbs algo¬ 
rithm combined with the grouping (or blocking) technique of Liu (1994). Indeed, 
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/3 7 and 7 , as well as A r and r, cannot be considered separately. An iteration of the 
algorithm would be made of three steps: update of / 3 7 ,7 | A r ,r,cr 2 ,Y, update of 
A r ,r | /3 , 7 , <r 2 ,Y and update of a 2 | /3 7 , 7 , A r , r, Y. However some drawbacks are 
associated with this algorithm. Firstly the update rates for (/3 7 ,7 ) and (A r ,r) will 
be very low since it is difficult to make good proposals for both /3 1 and 7 or for both 
A r and r. Secondly, the post-hoc interpretation of the vectors (3 1 and A r obtained 
at each iteration of this algorithm are not relevant. Indeed, they are not associated 
with the same change-points and functions from one iteration to the next. Besides, 
as explained in Lavielle and Lebarbier (2001) (see Section 4), in a Bayesian segmen¬ 
tation framework, the posterior mean of /3 , obtained from the posterior distribution 
of (/3 , 7 ), is uninteresting. The interpretation of this mean is not obvious since it is 
calculated over all the possible configurations of change-points. A solution is to use 
the posterior distribution of (3 conditionally to 7 . We have the same drawback for 
the functional part. 

For these two reasons, we propose the following two-step strategy: the first step aims 
at detecting the positions of the change-points and at selecting the functions, that is, 
to estimate the latent vectors 7 and r. To this end, the parameters (3 , A r and a 2 
can be considered as nuisance parameters, and we use the joint posterior distribution 
integrated with respect to (3 , A r and a 2 . This can be viewed as a collapsing technique, 
see Liu (1994) and van Dyk and Park (2008). In the second part, we estimate (3 , 
A r and a 2 , conditionally to 7 and r. The MCMC scheme would then be as follows: 

1. Estimation of 7 and r: use of a Metropolis-Hastings algorithm to draw from 
the joint posterior distribution 71 ( 7 , r|Y) integrated with respect to /3 7 , A r and 
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2. Estimation of (3 , A r and a: given the estimates 7 and r, use of a Gibbs sampler 
algorithm. 


In the following subsections, we give some details of both steps. 

3.1 Metropolis-Hastings algorithm 


The joint posterior distribution integrated with respect to /3 , A r and a 2 is the fol¬ 
lowing (see details in Appendix A): 

tt( 7, r|Y) oc (1 + c 1 )"^ /2 7r(7)7r(r)5f(7, r, Y), 


where 


0(7, r,Y) = 


/ 

V 

x 


( F 'd 

Y 1 + i) ] 

y 1 


M F'rFr )" 1 



1/2 


-1 


Iy' ( u- 1 - u~ 1 F r r f; r u; 1 + ^) F r ) f;u^ ) y 


-n/2 


and 


Uy = ( / - 

7 ' 1 + Cl 


x 7 (x;x 7 ) _1 x; 


-1 


To sample from 7r(7, r|Y), a Metropolis-Hastings algorithm is used. At iteration t , a 
candidate (7 *,r*) is proposed from (7^)7^), and using symmetric transition kernel 
the acceptance rate is: 

7r(7*, r *|Y) 


P{( (7V*)) = m in \ 1 , 


7r(7(*), rh)|Y) 


To have a symmetric transition kernel, two kinds of proposals are used (each one of 
probability 1/2): either k components of 7® are randomly changed, or l components 
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of are randomly changed (a value of 0 is switched to 1, and conversely). We modify 
only one of the two latent vectors at each iteration since proposals with modifications 
of both vectors have too low acceptance rates. In this algorithm, by convention, we 
suppose 71 = 1 (time 0 is a change-point, corresponding to r 0 = 0 ) and rq = 1 (the 
constant function is always selected). 

The number of iterations of this algorithm is b + m, where b corresponds to the burn- 
in period. Then 7 and r are estimated using the sequences { 7 ®} and {r^}, for 
t = b + 1,..., b + 771 . The most relevant positions for the change-points and the most 
relevant functions for the functional part are those which are supported by the data 
and prior information. In other words, they are those corresponding to the 7 and r 
components with higher posterior probabilities. I 11 practice, the selected components 
are those such their posterior probability is higher than a given threshold. As in 
Muller, Parmigiani, Robert and Rousseau (2004) or Muller, Parmigiani and Rice 
(2006), we choose a threshold that minimized a loss function. Here the considered 
loss function is the sum of the False Discovery and of the False Negative (FD + FN ), 
leading to a threshold of 1/2. This also corresponds to the selection of the median 
probability model in Barbieri and Berger (2004), which has been shown to have greater 
predictive power than the most probable model, under many circumstances. 

3.2 Gibbs sampler algorithm 

Once 7 and r have been estimated, our goal is to estimate /3 7 , A r and o 2 from the 
distribution 7 r(/ 3 7 , A r , cr 2 |r, 7 , Y) oc 7 t(/ 3 7 , A r , a 2 , r, 7 |Y). 


A Gibbs sampler algorithm is then used. At each iteration, the three parameters 
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should be drawn from its full conditional distribution given by: 



a 2 ’ V’ 

W r F' r (Y - X 7 /3 7 ) w ^ 

/t-2 ’ Wr 


where T 7 = a 2 [^X^xJ \ W r = a 2 \^F' r F T 


-l 


, a = | ^ + y and 


b = (Y - X 7 /3 7 - F r A r )' (Y - X 7 /3 7 - F r A r ) + (3\ /3 7 + A' r 




To estimate /3 , A r and a 2 , empirical posterior means are computed using only post¬ 


burn-in iterations. Using the estimators f3 and A of (3 and A, we then obtain the 
estimator /(•) = °f the function /, the estimators % of the change- 


points and the estimators /A of the means (see Section 2.1). The estimated number 


of change-points is given by K = I a. 


i =1 


4 Simulation study 

In this section, we conduct a simulation study to assess the performance of our pro¬ 
posed method to estimate both the ’’parametric” part (the segmentation part) and 
the“non-parametric” part (the functional part). Our method is called SegBayesSP 
for “semi-parametric”. Moreover in order to investigate the impact of taking into 
account the functional part on the estimation of the segmentation part, we com¬ 
pare the results of SegBayesSP with those of the same Bayesian procedure that 
includes only the segmentation part in the model (i.e. if the model is supposed to be 
Y t = pk + £t, Vf G Ik — {jk~ i, Tk],k G {1,..., A'}). This case is called SegBayes_P 












12 Meili Baragatti et a 1. 


for “parametric”. The estimation is still obtained using a Metropolis-Hastings algo¬ 
rithm to estimate 7 followed by a Gibbs sampler to estimate /3 and cr 2 . Section 4.1 
contains our simulation design, the parameters needed for the procedures and the 
quality criteria. Section 4.2 gives the results. 

4.1 Simulation design, parameters of the procedures and qual¬ 
ity criteria 

Simulation design. We consider model (2.1) with x t — t and the function / which 
is a mixture of a sine function with three peaks: 

f{t) = 0.3 x sin ^27r—j + 1.5/ t= o.ixn — 2 / t= 0 . 5 xn + 3 It=o.6xn■ (4-1) 

Note that this function contains both smooth components and local irregularities (see 
plot (h) of Figure 3). We simulate 100 series of length n = 100 with K = 4 segments 
and the mean of each segment takes a value in {0,1, 2, 3,4,5} randomly. The positions 
of the three change-points are randomly chosen with the following constraints: they 
are positioned at a distance from the peaks of at least 3, and each segment is at 
least of length 5. In order to consider several change-point detection difficulties, four 
levels of noise cr are considered: cr € (0.1, 0.5,1,1.5} (the more cr increases, the more 
difficult is the detection). 

Parameters. Several parameters or quantities need to be fixed in the procedures. 
For the procedure SegBayesSP, we consider the following dictionary which contains 
151 functions: 128 Haar functions (t 1 —> 2 7 / 2 D[ 01 ](|k( _ kj , k = 0,..., 2 7 — 1), the 
Fourier functions (t (-)■ sin (27 TjG) , t (-)■ cos (27rj^), j = 1, ...,10), the functions 
t 1 —y t and t t—>■ t 2 , and the constant function. Note that Table 1 gives the indexes 
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of some of these functions (those of / given by (4.1) are in particular 11, 51, 61 and 

110 ). 

For both procedures, we use the same prior parameters. With respect to the Metropolis- 
Hastings algorithm and the Gibbs sampler, we run each one for 20000 iterations in¬ 
cluding 5000 burn-in iterations. The parameters c\ and C 2 are fixed to 50, which is 
quite standard and recommended for instance by Smith and Kolin (1997). The initial 
numbers of segments and dictionary functions are 3. The number of change-points 
proposed to be changed at each iteration is 2, as well as the number of functions of 
the dictionary. The initial probability for each position of change-point is 0.01, as 
well as the initial probability for each function of the dictionary. To select relevant 
change-points and functions, we used a threshold equals to 1/2 (see section 3.1 for a 
justification of this threshold). 


Quality criteria. To study the performance of the procedures, the same criteria 
are used for both the segmentation and the functional parts. The first one is the root 
mean squared error (RMSE) that allows us to assess the quality of the estimation. 
The other ones (the false discovery rate (FDR) and the false negative rate ( FNR )) 
evaluate the quality of detection of the change-points and selection of functions. These 
criteria are detailed in the following: 

• For the segmentation part, 

— the root mean squared distance between the true mean and its estimate 
is RMSE(p) = yj 4 )CZ =1 (/i(t) - hit)) 2 , with A lit) = hk for t G I k = 
(Tfc-i, T fc ], k = 1,..., K and p(t) = for t e h = (?*_!, f fc ], k — 
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— the proportion of erroneously detected change-points among detected change- 
points, denoted FDRbp , and the proportion of undetected change-points 
among true change-points denoted FNRbp. 

• For the functional part, 

— the root mean squared distance between / and its estimate: RMSE(f) = 

— the proportion of erroneously detected functions among detected functions, 
denoted FDR(f), and the proportion of undetected functions among true 
functions denoted FNR(f). 

Note that a perfect segmentation results in both null FDRbp and FNRbp , as 
well as both null FDR(f) and FNR(f) is equivalent to a perfect selection of 
the functions. 

The averages of these criteria over the 100 simulations are considered. Note that we 
also look at the estimation of the standard deviation of the noise a. 

4.2 Results 

Both procedures have been implemented in R (development core team, 2009) on 
quad-core processors 2.8GHz Intel X5560 Xeons with 8MB cache size and 32 GB 
total physical memory. In particular, the standard Metropolis-Hastings algorithm for 
procedure SegBayesSP took 169 minutes, for the 400 simulations (100 series for 4 
noises, so it took approximately 25 seconds for each simulation). 

In the following, we first analyse the results obtained with the 100 simulated series 
and then we draw some remarks from the study of a particular series. 
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Overall results and comparison between SegBayes SP and SegBayes P. 

Figure 1 shows the average quality criteria over the 100 simulated series for the 4 
different values of er, and Figure 2 gives the average of the estimates a. 

When the detection problem is easy (a = 0.1), SegBayes SP tends to recover correctly 
the two parts of the model. Indeed the selected number of segments is close to the 
true one and the change-points are well positioned (small FDRbp and FNRbp). The 
same occurs with the selected functions. When the functional part is not taken into 
account, using the procedure SegBayes_P : the segmentation needs to compensate: the 
number of segments is overestimated leading to bad estimation of the segmentation 
part (higher RMSE(p) and FDRbp). This is illustrated in Figure 4 on one simulated 
series where the procedure selects false positive change-points to fit well the series 
(in particular to catch the peaks of the function). The higher FNRbp compared 
to SegBayes SP is explained by the fact that the precise positioning will be better 
obtained when the functional part is well estimated, in particular at positions where 
the jump of the means is small. 

When the noise cr increases, the two procedures tend to underestimate the number 
of segments and for SegBayesSP also the number of functions. These results were 
expected (and generally observed for the segments in segmentation problems). Indeed, 
in this case, the segmentation and the functional part will be confused together and 
also with the errors. To avoid false detection and selection, one may prefer to select 
less change-points and functions. This is particularly marked for the functional part 
for which we observed a big decreasing of the number of selected functions from 
a = 0.5. As a consequence, RMSE(f) increases and FNR(f) is almost close to 
1. Moreover since the functional is not recovered, the two procedures lead to the 
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same results for the estimation of the segmentation part (same values of the different 
criteria for the quality of the segmentation). 

Finally the quality of the estimation of both parts is related to the quality of the 
estimation of a: estimates a are good for a = 0.1 and o = 0.5, but they appear too 
large for cr = 1 and a — 1.5 (see Figure 2). 

An advantage of these Bayesian procedures is that we obtain empirical posterior 
probabilities for the possible change-points and functions. This will be illustrated in 
the following paragraph. 

Results for particular series. We look here in detail at particular series simulated 
as follows: three change-points are considered at positions 7, 18 and 36, the means 
over the four segments are 2, 0, 2 and 3 respectively and we consider two series: one 
with a — 0.1 and the second one with a — 1. 

The results of the procedure SegBayesSP for the first series with a = 0.1 (low noise) 
are given in Figure 3. We observe that, in this case, the true change-points and 
functions are exactly recovered (see plots (a) and (&)) and the two parts of the model 
are well estimated (see plots ( g ) and ( h )) leading to a good estimation of the whole 
(see plot (/)). Note that Table 1 gives the correspondences between the selected 
functions from the dictionary and their indexes, and by convention the change-point 
at time 0 and the constant function are selected. 

The advantage of a Bayesian approach compared to a frequentist one, is that the 
posterior probabilities give some additional interesting information. For instance, the 
posterior probabilities of the change-points at positions 0, 7 and 18 are 1, while those 
of the change-points 35 and 36 are 0.48 and 0.55 respectively (see plots (a) and (6)). 
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Figure 1: Average quality criteria on 100 simulated series with <r = 0.1, a = 0.5, cr = 1 and 
a = 1.5, for SegBayesSP (semi-parametric model) and SegBayes.P (parametric model). 
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sigma estimation 



Figure 2: Average of the estimates a on 100 simulated series with a = 0.1, a = 0.5, 
c j = 1 and a = 1.5, for SegBayesSP (semi-parametric model) and SegBayes.P (parametric 
model). 

Indeed, the choice between change-points 35 and 36 is not so easy for the sampler (see 
the traces in plots (c) and (d)). Observe also that we can deduce from the traces that 
the posterior probability of having a change-point in {35, 36} is close to 1. Moreover, 
let us give another comment. Instead of using a threshold to obtain 7 and r, another 
way could have been to use the posterior modes. However, this approach generally 
leads to more false positive change-points. This has been observed in many series and 
this is also the case in this particular series. Here using the posterior modes results in 
the detection of the true change-points at positions 7 and 18 as previously, but also 
of a change-point at position 35 instead of 36 and of a false positive change-point at 
position 11. 

On the same simulated series, the result of the procedure SegBayes_P (without con¬ 
sidering the functional part) is given in Figure 4. The change-points 7, 18, 36, 59 
and 60 are selected with high posterior probabilities. The two last detected change- 
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points are false positive but they correspond to the Haar 60 from the functional part. 
As explained in the previous paragraph, when the functional part is forgotten, the 
segmentation tends to catch it. 


(a) breakpoints selection 
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Figure 3: Results of SegBayesSP on the particular simulated series with a = 0.1. Posterior 
probabilities for the 7 and r components (plots (a) and (6)). Traces of the 7 components 35, 
36 and 11 (plots (c), (d) and (e)). The series, the whole true expectation and its estimation, 
the True Positive (TP), False Positive (FP) and False Negative (FN) change-points are also 
represented (plot (/)). The true and estimated segmentation part and functional part (plots 
(. 9 ) and (h)). 


As pointed out previously, the case o = 1 is challenging since the jump of 1 (even 2) 
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Figure 4: Result of SegBayes.P on the particular simulated series with a = 0.1: the series, 


the whole true expectation and its estimation, the True Positive (TP), False Positive (FP) 


and False Negative (FN) change-points are also represented. 
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Tabic 1: Functions selected from the dictionary and their corresponding indexes. 
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on the mean of the series become difficult to detect, as well as peaks functions which 
can be confounded with the noise. In Figure 5, we can see that using SegBayesSP 
the posterior probabilities of the true change-points at positions 7 and 18 are 0.60 
and 1 respectively, while the posterior probabilities of the change-point at position 
36 which is not selected is 0.19. Concerning the functional part, only the Haar 60 is 
detected with posterior probability 1. Looking at the series, this is expected since at 
position 36, the jump is not marked. The result of the procedure SegBayes-P on this 
same series is not very good, since only the true change-point 18 is detected, with a 
posterior probability of 0.72. 


Sensitivity and convergence. To study the sensitivity of the estimates 7 and r to 
the choice of prior parameters, we ran the Metropolis-Hastings algorithm on the same 
particular series as before with a = 0 . 1 , with different choices of prior parameters. 
Table B in Appendix B shows the results obtained for several simulation scenarios 
(21 different runs). 

On average, the procedure is not over-sensitive to the choice of the prior parameters: 
among the 21 runs, 10 detected exactly the true change-points and functions, and 
8 runs detected the true change-points and functions with a shift or an “exchange”. 
For instance runs 9 and 13 select change-points at positions 37 and 35 respectively 
instead of the 36. Runs 6 , 11, 16 and 19 select change-point at position 10 instead of 
7, and functions 9 and 10 (Haar 8 and Haar 9) instead of 11 (Haar 10). 

Some other sensitivity remarks can still be done. First we can see that too small 
values for <7 and C 2 should not be used since it results on too many undetected 
change-points and functions (see run 2). Moreover the number of components of 7 
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(a) breakpoints selection 
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Figure 5: Results of SegBayesSP on the particular simulated series with a = 1 . Top: 
posterior probabilities for the 7 and r components. Bottom: the series, the whole true ex¬ 
pectation and its estimation, the True Positive (TP), False Positive (FP) and False Negative 
(FN) change-points are also represented. 
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and r to be changed at each iteration should not be too high. Indeed, in this case, the 
proposed changes are too difficult to accept leading to a poor acceptance rate. For 
instance, for runs 12 and 15, the acceptance rates for 7 and r are respectively 0.05% 
and 0.02% (instead of 3% for the other runs). It results that for these two runs some 
false-positive change-points and functions are selected. On the contrary, the initial 
number of segments and functions and the values of the probabilities 7 p and rjj do 
not seem to influence too much the number and the validity of selected change-points 
and functions from the dictionary (see runs 4 to 9 and 16 to 21). 

To study the convergence, we ran the algorithm three times with the same prior pa¬ 
rameters than run 1, with 20000 iterations (5000 of burn-in), and one time with 50000 
iterations (10000 of burn-in). The results are given in Table 3 in Appendix B. We 
observe that 20000 iterations including 5000 of burn-in seem enough to reach conver¬ 
gence since the obtained results are similar for these four runs (or to the previous runs 
1, 5, 8 , 11, 14, 17 and 20 that have the same prior parameters). The acceptance rates 
for 7 and r are not very high in general (around 3%). But if we look in more detail at 
the traces, it appears that usually when a true change-point or a true function from 
the dictionary is selected, it will be selected until the end of the algorithm, while a 
position which is not a change-point will be alternatively selected and unselected. As 
a consequence, when most of the change-points and bias functions have been selected, 
the chain will not be much updated, resulting in a poor acceptance rate. 
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5 Application 

5.1 Geodetic data 

In this section, we propose to use our procedure in the geodesic held for the problem 
of homogenization of GPS series. Indeed, such series are used to determine accurate 
station velocities for tectonic and Earths mantle studies (King et ah, 2010). However 
they are affected by two effects: (i) abrupt changes that are related to instrumental 
changes (documented or not), earthquakes or changes in the raw data processing 
strategy and (ii) periodic signals that are due to environmental signals, such as soil 
moisture or atmospheric pressure changes. The correct detection of these effects is 
fundamental for the aforementioned application. 

Here we consider a particular series (the height coordinate of the series) from the 
GPS station in Yarragadee, Australia, YAR2 at the weekly scale. The data can be 
downloaded at http://sideshow.jpi.nasa.gov/post/series.htmi. We refer the reader to 
Bertin et al. (2016) for more details about the problem and the data. 

We apply our proposed procedure SegBayesSP to this series with a dictionary of 
194 functions that includes the constant function and the Fourier functions: t t —y 
sin (2'Kw i t ), t (->■ cos (2-KWit) where Wi = i/T, T = max(f) — rnin(f) and T/i is larger 
than 8 weeks. The Metropolis-Hastings algorithm is run for 100 000 iterations (30 
000 burn-in), with c\ = C 2 = 50. The initial number of segments and functions is 5, 
the number of change-points or functions proposed to be changed at each iteration 
is 1. The initial probability for each possible function is 0.01, the initial probability 
for a position not associated with a known equipment change or malfunction is 0.01 
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and the initial probability for a position associated with a known equipment change 
or malfunction is 0.5. Concerning the Gibbs sampler algorithm, we run it for 100 000 
iterations including 50000 burn-in iterations and we choose C\ = C2 = 50. 

In the top of Figure 6, the posterior probabilities of the change-points and functions 
are given. In the bottom of Figure 6, the series, its reconstruction, the validated, un¬ 
reported and missed change-points are plotted (validated means reported in databases 
and detected, unreported means not reported in databases but detected, and missed 
means reported in databases but non detected). The different status of the change- 
points should be tempered since they are based on known equipment changes and 
malfunctions. Some unknown malfunctions should have appeared, hence some unre¬ 
ported change-points could in fact be real change-points. Moreover small earthquakes 
can have been not reported in databases. 

Four change-points previously reported in databases are detected by SegBayesSP. In 
particular, GPS weeks 1016 and 1085 correspond exactly to clock changes, and GPS 
weeks 1689 and 1708 correspond exactly to radom radar changes. The unreported 
change-point at GPS week 1057 maybe associated with the receiver and clock change 
at GPS week 1031. Note that we also apply our procedure with non informative prior 
for the positions of the change-points (all the initial probability equal to 0.01). In 
this case, several documented breakpoints are not yet detected, showing the gain of 
using prior knowledge. 

Examining the selected functions, 4 of them were selected by the procedure: sin ( 27 t x 
^), cos ( 27 T x ^), sin ( 27 r x and sin ( 27 T x T-). These functions furnish relevant 
geodetic information. In particular, the selection of the three functions with periods 
of 52 and 26 weeks is consistent with the fact that atmospheric pressure can be 
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approximated by periodic signals with dominant annual and semi-annual periods 
(Dong et al., 2002). 


Breakpoints selection 


Functions selection 



gamma components 



YAR2 



Figure 6: Result of the procedure SegBayesSP on the YAR2 series. Top: posterior proba¬ 
bilities of the change-points and functions. Bottom: Estimated expectation and validated, 
unreported and missed change-points (based on known equipment changes and malfunc¬ 
tions). 


5.2 Exchange rate data 

In this section, we propose to use our procedure in the econometrics field for the 
problem of the exchange rate. More precisely, we study the daily records of Mexican 
Peso/US Dollar exchange rate from January 2007 to December 2012 (data available 
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at www.federalreserve.gov). These data were studied by Martinez, and Mena 
(2014) with a Bayesian nonparametric method. 

We apply our proposed procedure SegBayesSP to this series with a dictionary of 
23 functions that includes the following functions of time (i i —y P, j = 1,2} and 
the Fourier functions: t i —y sin (27 t it/n) ,t t —y cos (2nit/n) for i = 1,..., 10. The 
Metropolis-Hastings algorithm is run for 100 000 iterations (30 000 burn-in), with 
Ci = C 2 = 50. The initial number of segments and functions are 5, the number of 
change-points or functions proposed to be changed at each iteration is 1. 

The initial probability for each possible function is 0.01, as well as the initial proba¬ 
bility for a position. Concerning the Gibbs sampler algorithm, we run it for 100 000 
iterations including 50000 burn-in iterations and we choose ci = C 2 = 50. 

In the top of Figure 7, the posterior probabilities of the change-points and functions 
are given. In the bottom of Figure 7, the series with both the estimation of the 
expectation and the estimated change-points. The five most probable change-points 
detected are at dates October 03, 2008, January 14, 2009, March 26, 2009, April 3, 
2009 and September 14, 2011. 

These are close to the ones obtained by Martinez, and Mena (2014) and, as ex¬ 
plained in this latter paper, four can be related to events in Mexico and USA: 1) 
September-October 2008: pick of the 2007-2008 financial crisis; 2) March-May 2009: 
the flu pandemic suffered by Mexico; 3) the US debt-ceiling crisis in 2011. The other 
change-points detected by Martinez, and Mena (2014), that do not correspond to 
known events, are not detected by our procedure, certainly due to the presence of the 
functional part. 
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Mexican Peso/US Dollar Exchange rate 



Figure 7: Result of the procedure SegBayesSP on the Mexican Peso/US Dollar exchange 
rate series. Top: posterior probabilities of the change-points and functions. Bottom: Esti¬ 
mated expectation and change-points. 


6 Discussion 

In this paper, we propose a novel Bayesian method to segment one-dimensional 
piecewise-constant signals corrupted by a functional part. The functional part is 
estimated by a linear combination of functions from a dictionary. Since the dictio¬ 
nary can be large, this method is flexible and allows us to estimate functions with 
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both smooth components and local irregularities. For the estimation of the change- 
points, following Harchaoui and Levy-Leduc (2010), the piecewise-constant term is 
expressed as a product of a lower triangular matrix by a sparse vector. This presents 
the advantage that the Stochastic Search Variable Selection approach can be applied, 
avoiding the use of the Reversible Jump method in which the mixing of the chains 
can be problematic (see Lavielle and Lebarbier, 2001). 

The global estimation procedure is based on a Metropolis-Hastings algorithm for esti¬ 
mating the positions of the change-points and for selecting the functions (estimation 
of the latent vectors 7 and r), and on a Gibbs sampler to estimate the parameters 
(3 , A r and a 2 , conditionally on 7 and r. It is quite common in Stochastic Search 
Bayesian Variable Selection to integrate all the parameters except the latent vectors, 
see for instance Georges and McCulloch (1997) or Bottolo and Richardson (2010). 
We obtain good results in this paper in a simulation study and on two real data 
sets. In particular, we illustrate in the simulation study that taking into account the 
functional part improves the detection of the breakpoints. 

Our procedure benefits from the Bayesian framework, which results on two important 
aspects. The first one is that posterior distributions of the parameters are obtained. 
Then, we can have a quantification of the uncertainty through posterior probabili¬ 
ties for the possible change-points and functions composing the functional part, or 
through credible intervals. Credible intervals have not been used in our examples, 
but they can be easily calculated for the means over the segments (/ii,..., for 
the coefficients (Ai,...,Am) of the selected functions or for the functional part /. 
The second important aspect is that we can introduce expert knowledge in the model 
through prior distributions (see Section 5). In particular in the geodetic field, some 
change-points are not detected when non informative priors are used whereas they are 
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detected when previous knowledge is taking into account. Moreover, change-points 
and functions composing the functional part are detected in an automatic way. This 
could provide a significant improvement for the users in this field since previously the 
detection of change-points was done visually. 

An important issue is the choice of the criterion to estimate the parameters 7 and r. 
As proposed by Muller, Parmigiani, Robert and Rousseau (2004), the criterion used 
in this work minimizes a loss function which is the sum of the False Discovery and 
of the False Negative, leading to a threshold of 1/2 for the posterior probabilities. 
In the simulation study, this criterion seems to outperform other strategy based on 
the posterior mode which leads to more false-positive change-points. Moreover other 
thresholds can be used, to minimize different loss functions, see Muller, Parmigiani, 
Robert and Rousseau (2004) or Muller, Parmigiani and Rice (2006). Finally, an other 
way to select the final change-points and functions from the dictionary would be to 
run the algorithm say three times, and to take the intersections of the three results, 
for both the change-points and the bias functions. That would lead to perfect results 
for most of the groups of three runs from the sensitivity analysis. 

To use our procedure, hyper-parameters should be chosen, but the sensitivity analysis 
shows that the procedure is not over-sensitive to these choices. 

Eventually, we think that this kind of approach should not be restricted to the prob¬ 
lem of detecting change-points in a one-dimensional series contaminated by a signal. 
Indeed, it should be interesting and useful to extend this approach to the joint seg¬ 
mentation of multiple series corrupted by a functional part. 
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A Integration of the joint posterior distribution 

Integrating the joint posterior (2.2) with respect to (3 1 we obtain: 

7r(7, A r , r, a 2 |Y) oc (27ra 2 ) _?1//2 (l + Cl ) _d '>' //2 7r(7)7r(r)cY 2 



Integrating with respect to A r , we obtain: 


7r(7, r, cr 2 |Y) oc (27tct 2 ) 2(l + ci) iy ^ 2 n('y)n(r)a 2 



V 





Finally, integrating over cr 2 , we obtain the integrated posterior (3.1). 
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Tabic 2: SegBayesSP: prior parameters used in the different runs of the Metropolis- 
Hastings algorithm applied on the particular series with a = 0.1. The number of iterations 
is 20000 with a burn-in of 5000 iterations for all runs. 




A Bayesian approach for the segmentation of series corrupted by a functional part 33 


Run 

Number of 

iterations 

burn-in 

Selected 

change-points 

Selected 

functions 

22 

20000 

5000 

7, 18, 36 

1, 11, 51, 61, 110 

23 

20000 

5000 

7, 18, 36 

1, 51, 61, 76, 110 

24 

20000 

5000 

8 , 18, 36 

1, 9, 11, 51, 61, 110 

25 

50000 

10000 

7, 18, 36 

1, 11, 51, 61, 110 


Table 3: SegBayesSP: results of four runs of the Metropolis-Hastings algorithm applied 

on the particular series with a = 0.1, with the same prior parameters than run 1 in Table 

2 . 
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